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This paper motivation is to find the most accurate technique to predict the 
ground level ozone at Al Jahra station, Kuwait. The data on the 
meteorological variables (air temperature, relative humidity, solar radiation, 
direction and speed of wind) and concentration of seven pollutants of 
environment (S02, NO2, NO, CO2, CO, NMHC, and CH4) were applied to 
forecast the ozone concentration in atmosphere. In this report, three methods 
(PLS regression, support vector machine (SVM), and multiple least-square 
regression) were used to predict ground-level ozone. We used Fifteen 
parameters to evaluate the performance of methods. Multiple least-square 
regression, partial least square regression (PLS regression), and SVM using 
linear and radial kernels were the best performers with MAE (mean absolute 
error) of 9.17x 10-03, 9.72 x 10-03, 9.64 x 10-03, and 9.12 x 10-03, 


respectively. SVM with polynomial kernel had MAE of 5.46 x 10-02. These 
results show that these methods could be used to predict ground-level ozone 
concentrations at Al Jahra station in Kuwait. 
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1. INTRODUCTION 

This paper intends to find the most accurate technique in forecasting the ground level ozone. Kuwait 
has a number of civic areas where the pollution of air has dramatically grown up due to the industrialization 
and technical development, and over population [1]. All such factors have been contaminating the air and 
causing enhanced air pollution in the areas with high number of population. These populated areas of Kuwait 
might experience severe health related issues in near future due to the pollutants in their nearer proximity. At 
the initial stage, the ambient of ozone layer has been severely damaged by the pollution and this has become 
a matter of concern due to the increasing pollutants that are being originated from developed and 
industrialized nations. Air pollution can have more drastic affects for populations living in areas closer to the 
industrial estates that are causing the pollution in the air. This might be dangerous for the health of all living 
things especially chronic respiratory infection, aggravate asthma, lung inflammation, damaged lung defense 
mechanisms and decreased immunity in human beings [2]. 

Disturbance to the balanced atmosphere has become the major cause of damage to the ozonosphere. 
This damage might lead to severe foliar injuries, reduction in biomass production and agricultural yield. A 
considerable shift in the competitive benefits from different species of plants in varied populations has also 
been seen [3]. The ozone layer’s damage is consistently increasing which is dangerous for the health of 
humans and the environment [4]. According to scientists, the ozone layer is formed by O; that results the 
depletion from the reaction of ultraviolet rays and chemical interaction of oxides of nitrogen NOx along with 
organic species [5]. 

All such basic pollutants have generally originated from industrial and other factors such as urban 
development. Lack of protection from ozone might create health issues because of its high reaction, to 
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chemicals, rubber materials and fabrics. It also affects severely to some crops. Therefore, the ozone’s 
concentration in the lower atmosphere has been greatly consid erable due to its reaction against oxidizing 
photochemical fog. As a vital index substance of photochemical smog, ozone has been considered as the 
major pollutant affecting the quality of air and environment [6], [7]. 

Two main chemical precursors of ozone along with other photochemical oxidants have been 
recognized as Hydrocarbons (HC) and Nitrogen Oxides (NOx) [6]. Petrochemical processes and fuel or oil 
burning along with transportation have been the major causes of atmospheric HCs and NOx. Most of such 
pollutants are measured by emission rates that are counted with the help of activities going on in urban and 
industrial areas. The association of such basic pollutants with meteorological elements might help reach to a 
determination of ozone levels. A model that determines the chemical processes and atmospheric movements 
might be an appropriate strategy. The chemistry of organic species might also be difficult to be accurately 
collected. The complicated ozone’s formation has been uncontrollable. The stratospheric ozone’s layer that 
protects the earth from ultraviolet rays causing harms for both humans as well as for crops. But the lower 
level ozone concentration has been the main concern in terms of causing health related issues and other 
material and vegetation effects [8]. Ozone inhalation can initiate numerous health problems like respiratory 
tract irritation, eyesight problem, coughing, chest tightness, wheezing. Children who usually go outside 
during the daytime in summer are likely to be at risk when the concentration of ozone remains higher in the 
air. Further, this ozone is also become a reason of loss in the agricultural production. 

In the recent past, the air pollution in the urban areas of the country is heavily increased [9]. This is 
the due to the overpopulation, technical growth, and rapid industrialization in the country. The levels of 
ozone pollution as observed in the residential district of Salmiya were increasing the ambient quality 
standards of air during certain times of year. Therefore, there is an immense need of accurate forecasting of 
the surface ozone; as with the forecasting it would be assisting in the successful implementation of the 
warning strategies for public especially during the episodic days in country. 

There are three wide areas in terms of meteorology that need to be focused for ozone’s 
concentration through statistical methods. Every area of approach is considerably unique from others: first 
one is regression based method, the second is extreme value method, and third one is Space-Time approach. 
Ozone’s variability is decreasing that is yet to be understoo d and that has also been under consideration very 
commonly with the help of meteorological adjustment. The change in the climate, or a change in the policy, 
ultimately creates a change in the process. These changes might be considerably smaller and hard to be 
identified, which needs efforts of separation of it from weather and climate [10]. Regression based method 
and extreme value method, both concentrate on forecasting, estimation and revealing the fundamental 
mechanisms. 

Similarly, the studies carried out for the analysis of ozone level were focused on the comparison of 
ozone levels with the standard limits internationally. This comparison includes the study of seasonal trends of 
ozone levels, understating of behavior diurnally in ozone, assessment of effects on health by ozone 
pollution [11], [12]. 

Few studies on developing a robust system for a public warning system of forecasting that can be 
utilized, most of the forecasting systems were developed for the prediction of concentration in the ambient 
ozone in Kuwait with the use of precursor concentrations and meteorological data [11], [12]. 

In [1], predicting the levels of ozone from meteorological conditions and precursor concentrations at 
(SIA) Shuaiba Industrial Area of Kuwait during the daylight hours was achieved by using step wise multiple 
regression modeling. 

The application of artificial neural networks and the principal component regression was done for 
the prediction of ozone of concentration in the lower atmosphere of Kuwait. The prediction was done using 
five variables of meteorology (air temperature, relative humidity, solar radiation, wind direction, and wind 
speed) and the data from seven concentrations of environmental pollution (SO2, NO, NO, CO2, CO, NMHC, 
and CH3). 

As linear regression is a well known method ,a number of studies such as [1], [13-16] have 
discussed multiple linear regression method to associate the ozone’s measurement with contemporary 
meteorological measurements. These models have presented with lack of autocorrelation and cross- 
correlation and this is why these models do not fulfill the basic requirement of merit, proven scientifically. 
Time series regression has been another complex factor that associates the still relationship along with a 
correlation structure such as a simple AR (1), for the residuals [17]. This is considerable only when the fits 
are diagnosed by appropriate methods. As this technique is robust however, these methods might be 
inappropriate for obtaining the interactions and nonlinear response of ozone’s concentration. 

In the paper [18], the comparison of performances of various forecasting systems is done across 
different locations in Kuwait. Fuzzy modeling and time series are the tools which are used for the analysis. 
The two forecasting models of analysis depicted a significant improvement in comparison to the currently 
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used model of forecasting air pollution i.e. pure persistence forecast. Large proportion of ozone variation is 
described by the daily maximum temperature. 

According to [19] the statistical linear models have been experienced as complex for gathering the 
multifaceted association in ozone and meteorological variables. For gathering the data and developing a 
parametric nonlinear model, around forty five monitoring stations were established in the Chicago region and 
they provided considerable data during 1981 to 1991. The data was gathered at the AIRS database. The 
authors presented a model of the daily median across different sites, with a maximum of one hour of ozone’s 
average values, with the help of nonlinear least squares. During the exploratory graphical evidences and 
through nonparametric modeling, several relationships were observed such as, contemporary still ozone, 
relative humidity, upper earth surface temperature, and seven hundred HPA surface wind speeds, that present 
a trend of parametric forms. The Fourier series is applied in the modeling of seasonal waves. 

This helps in calculating the standard errors in the coefficients and authors confirms the occurrence 
of serial autocorrelation in the model residuals. The authors accordingly apply suitable adjustments by 
applying the Galant’s methods [20] applied the linear method of stepwise multiple regression so that the best 
fit equation could be formed that relates to the ozone’s maximum concentration during the daylight period in 
the air and meteorological conditions with a twenty four hour air’s upwind parcel trajectory. There were four 
variables involved in the equation, such as maximum upwind ozone on earlier day, maximum temperature 
upwind of last day, and the average upwind speed in both the upper and the lower layer. The rate of 
emissions for upwind along with hydrocarbons and nitrogen oxides that form the lower layer of ozone, were 
also examined and found to be lacking in improvement of multiple correlation coefficients. 

A number of evaluations of variables used in meteorological adjustments are aimed at being 
stepwise but lacking in gathering all appropriate subsets. These subsets were computationally very difficult 
due to huge number of variables. Stepwise method was found to be missing a global phenomenon of model 
selection, therefore, it has been causing the problem when a variable is eliminated earlier, might have vital 
interactions with the others. They are later dropped from the model after being masked. [21] Applied a 
different strategy for linear models for the determination of health related issues with regards to specific 
matter and air pollution. It was an approach that set down earlier probabilities of containing the numerous 
variables and then calculates the uncertainty of associated model with regards to posterior probabilities for a 
vast number of models. 

It is difficult to predict the levels of ozone by using theoretical method (for example detailed 
atmospheric diffusion model). For the development of forecasting system, the empirical analysis is needed. 
The well evaluated forecasting model of ozone would be the factor which can raise the chances of a 
successful control strategy. In addition to this, the daily forecasting of maximum ozone concentrations would 
be helpful in reducing and avoiding the damages and injuries related with ozone. The conducted research is 
important in this manner because it gives comparison between different statistical methods for the prediction 
of ground level ozone at Al Jahra station Kuwait. 

Because of its major significance to atmospheric chemistry, ozone has been widely studies for 
several years both theoretically as well as experimentally [21]. Despite being just a triatomic, learning ozone 
kinetically, spectroscopically as well as dynamically has been highly difficult for the theory. There is 
considerable divergence between anticipated as well as detected low temperature rates of O+O, isotope 
exchange [23]. At low temperature, experimental rates are three to five times greater than predictions and 
reveal a negative temperature on the basis of the fact that has been evidenced tough reproducing 
theoretically. In the troposphere, the level of ozone is of immense significance due to its negative impact on 
vegetation, materials and human health. 

Through complicated photochemical reactions in the sunlight, ground-level ozone is mainly 
produced from its precursor of NOx as well as volatile organic compounds (VOC), according to [24]. By 
physical and chemical processes as well as by the meteorological conditions, accumulation of ozone at 
ground level is affected. Over numerous temporal and spatial scales, atmospheric pollution extents reveal 
complicated inconsistency with harmful impacts on the environment [25]. 

The existing condition of improvement in the measurement studies as well as modeling of ozone 
precursors, transport processes and photochemical behavior has been currently evaluated [26]. Although 
ozone chemistry has been widely examined in several chamber experiments as well as in the photochemical 
modeling analysis [27], there are still difficulties in perfectly forecasting ambient ozone levels and its spatial 
distribution, behavior and related patterns. One has to develop comprehension of not just ozone itself for 
tracking and predicting ozone, however also the situations that integrate to its formation. It is essential to 
implement the models describing and helping to know the complicated associations between several 
variables and ozone levels causing or hindering ozone production. 

To predict ozone variations time to time, photochemical models are sometimes used and assist to 
develop lucrative sources of minimizing ambient ozone to control, particularly the emissions of NOx and 
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VOC from different sources [28]. Ozone formation differs on the basis of hours, days and seasons, due to the 
complicated series of reactions are handled by sunlight and temperature. For predicting ozone levels, a 
survey of the significance of meteorology in the surface ozone levels is portrayed as well as linear regression 
ways were employed [29]. In the United States as well as other industrialized countries, ground-level ozone 
pollution is said to be severe health issues in several cities. 

For several ozone-sensitive individuals, specifically people who suffer from respiratory diseases like 
asthma, high summertime levels may cause distress [30]. In most of the cities, it is clear that the public 
forecasts or announcements of potential unhealthy ozone air quality for future may be of great advantage to 
those at the risk of respiratory discomfort [31]. Moreover, “ozone action” processes was intended in several 
cities to control episodic emission due to its health effect. These actions rely on forecasting of ground level 
ozone. 


2. THE PROPOSED METHOD 
2.1. Datasets 

The datasets used in this report were discussed in the public warning systems for forecasting 
ambient ozone pollution in Kuwait. Data from 2006 to 2008 were used for training the methods, and data 
from 2009 and 2010 were used for testing the methods. Datasets were pre-processed in Microsoft Excel using 
information from [32] before analyses. 


2.2. Methods 
2.2.1. Multiple Least-squares Regression 

Multiple least-squares regression (MLR) models the relationships between dependent variables(Y) 
and independent variables (X) as shown in Equation (1). 


Y=XB+E (1) 
where B is the matrix of regression coefficients and E contains the residuals. 


2.2.2. PLS Regression 

PLS regression predicts a relationship between a set of predictor variables, X, and a set of dependent 
variables, Y. In the first stage of PLS regression, w and q are weight vectors derived from X and Y 
respectively, where the corresponding scores t = Xw and u = Yq. Next, calculating least squares regression 
between u and t, the inner relationship r is determind. At last, rank-one reductions of X and Y are performed 
such that Xj-1 = Xj-tjpjT and Yj-1 = Yj-rjqjT, where pj = XjTtj/(tjTtj), and j is the latent variable being 
calculated. Thes stages are repetitive until the desired number of latent variables (K) has been extracted 
The general model is given by Equation (2) [33]. 


K 
Y=) (X wq; +E (2) 


j=l 


2.2.3. SVM 

SVMs are used widly due to their use of kernel functions to represent data, The 
differentkernel functions of SVM are Polynomial, Linear, Sigmoid, and Radial Basis Function (RBF) [34]. 
In this paper linear, radial, and polynomial kernels were used for training and predictions. 


2.3. Statistical Analysis 
R3.1.2 was used to perform the statistical analyses, with the following packages (pls, and e 1071). 
The performances of the methods were assessed using mean absolute error. 


2.4. Experimental Setting 
2.4.1. Multiple Least-square Regression 

MLR was applied to the normalized data using Equation (1). The regression coefficients and the 
p-values of the training of the MLR method are represented in Table 1. All variables were significant at 
p < 0.001. Variables with positive regression coefficients were SO2, NO2, WG, SOLAR, TEMP-AMB, and 
CO2. On the other hand variables with negative regression coefficients were NOX, PM10, CO, CH4, NCH4, 
WS, TEMP-IND, and Wind-deg. The variable NO had the largest positive effect on the Ozone with the 
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regression coefficient of 0.5219, and the variable NOX had the largest negative effect on Ozone with the 
regression coefficient of -0.5369 (Table 1). 


Table 1. Regression coefficients and p-values of the predictor variables from training dataset (2006 to 2008) 


Estimate Std.Error tvalue Pr>lt) 
(Intercept) 0.0122023 0.0008534 14.299 <2e-16 
S02 0.0452040 0.0028250 16.001 <2e-16 
NO 0.5218786 0.0390372 13.369 <2e-16 
NOX -0.5368621 0.0398481 -13.473 <2e-16 
NO2 0.0582727 0.0078268 7.445 1.01e-13 
PM10 -0.0058920 0.0008625 -6.831 8.67e-12 
(oe) -0.0039184 0.0010157 -3.858 0.000115 
CH4 -0.0130075 0.0013198 -9.856 <2e-16 
NCH4 -0.0077966 0.0014514 -5.372 7.89e-08 
WS -0.0295952 0.0048 163 -6.145 8.17e-10 
WG 0.0384623 0.0049445 7.779 7.70e-15 
SOLAR 0.0123449 0.0003574 34.540 <2e-16 
TEMP-IND -0.0088203 0.0005373 -16.415 <2e-16 
TEMP-AMB 0.0202759 0.0003742 54.181 <2e-16 
CO2 0.0187256 0.0029063 6.443 1.20e-10 
Wind-deg -0.0020436 0.0003279 -6.233 4.67e-10 


2.4.2. PLS Regression 
With PLS regression, the question is the number of PLS regression components to consider so that 
we don’t select noise in training the model. In this report, root mean square error of prediction was used to 
select the number of PLS regression components (Figure 1). Five PLS regression components were selected 
and used for training the method. The model created was then used for predicting the ground-level ozone 
concentrations using test dataset (2009 and 2010). The model mean absolute error was 9.72 x 10® (Table 2) 
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Figure 1. Screen plot of PLS regression components. Five PLS regression components were selected and 


used to build the model. 


Table 2. Mean absolute error of MLR, PLS, SVM (linear, radial, and polynomial) 


Method Mean absolute error 
MLR 9.17x 10% 
PLS regression 9.72x 10% 
SVM linear 9.64 x 10° 
SVM radial 9.12 x 10° 
SVM polynomial 5.46 x 10” 
2.4.3. SVM 


SVM was trained using three kernels (linear, radial, 
selecting the best parameters (gamma and cost) for linear, radial, and polynomial kernels. The training 
dataset was split into two sets; the first set for training and second set for validation. Table 3 shows the 
results of grid search for linear, radial, and polynomial kernels, respectively. 


and polynomial). A grid search was used for 
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The best gamma parameters for linear, gamma, and polynomial were 1 x 10%, 1.0 x 10%, and 
1.2 x 10°, respectively; and the best cost parameters for linear, gamma, and polynomial were 1, 1 x 10°, and 
100, respectively. 

The mean square errors for predictions were 1.02 x 10%, 5.01 x 10°, and 1.9 x 10° for linear, radial, 
and polynomial kernels, respectively. 

The best parameters obtained from tuning the kernels were used for training the methods. The mean 
absolute errors of SVM trained using the three kernels were 9.64 x 10°, 9.12 x 10”, and 5.46 x 10° for linear, 
radial, and polynomial, respectively (Table 2). 


Table 3. Results of obtained from tuning SVM three kernels (linear, radial, and polynomial) 


Kernel Gamma Cost Mean squared error 
SVM linear 1x10” 1 1.02 x 10 
SVM radial 1.0 x 10% 1x10” 5.01 x 10% 
SVM polynomial 1.2 x 10% 1x 10” 1.9 x 10° 


3. RESULTS AND DISCUSSIONS 
Table 2 and Figure 2 show the performances of all the three methods (MLR, PLS regression, SVMs) 
on our test dataset. 
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Figure 2. Monthly values of actual, predicted MLR, PLS, SVM linear, SVM radial, and SVM polynomial 

values of ground-level ozone. MLR: Multiple-least square regression; PLS regression; SVM linear: SVM 

using linear kernel; SVM radial: SVM using radial kernel; and SVM polynomial: SVM using polynomial 
kernel 


The MLR trained model was used to predict ground-level ozone concentrations using test dataset 
(2009 and 2010). The model mean absolute error was 9.17x 10-03(Table 2). 

MLR, PLS, and SVM with linear and radial kernels performed better than SVM with polynomial 
kernel in predicting the test dataset as indicated by low MAE (mean absolute error). Additionally, our results 
show a strong linear relationship between the ground-level ozone concentrations and the predictor variables. 

The monthly value of actual data was found to be 0.02 parts per million at the month of January 
2009. The SVM using polynomial kernel was below the actual data value and lie between multiple least 
square regression and actual data. This is the same position where SVM e radial was situated. The position of 
Multiple least square regression was at 0.01 parts per million. 

In the month of February the actual data shows a decreasing trend. While Multiple Least squares 
Regression is showing a rapid increasing trend , the SVM polynomial is decreased to 0.01 parts per million 
and at the same point the SVM radial was also present. The PLS regression in February also depicts the 
position at 0.01 parts per million. 
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In the month of March the most rapid increasing trend was depicted by Multiple Least squares 
Regression and PLS. The Actual data is still in decreasing trend but at a slow rate. The SVM polynomial is 
also increased than the month of February. Here SVM radial is depicting almost the same level of ground 
ozone that is shown by SVM polynomial. The March shows a decreased trend in SVM linear. 

With the start of April the actual data retains its position which it had in the month of January, i.e. 
0.0200 parts per million of ground level ozone. The Multiple Least squares Regression also shows a 
decreased trend. PLS regression in this month has reached the 0.01 ppm level which is far less than the 
previous month i.e. more than 0.02 ppm. SVM linear has slightly increased whereas the SVM radial has a 
very slight decrease than last month. SVM polynomial has reached the 0.01 ppm level of ground ozone in 
this month. 

From now onwards there is an increased trend in all the three methods and the predictor variables 
till the month of August. Actual data also attains the highest position in 18 months after the month of august. 
From August the trend in every predictor tends to be decreasing especially actual data. In the meanwhile only 
SVM polynomial is sustained and depicting only a very less fluctuation. After the month of September the 
trend falls very rapidly specially SVM polynomial. The concentration of ground level ozone decreases in 
every predictor till the month of November. Multiple Least squares Regression is positioned at the highest 
point among other methods in this month with 0.02 ppm ground level ozone. 

In the month of November the lowest position is of SVM linear. The trend of decrease does not stop 
here but continues with some fluctuations in different variables and methods till the month of January. 
January shows the lowest trend of concentration of ground level ozone which ousts it from the periodic 
season. The SVM radial reaches the lowest of all and is 0 ppm in this month. The SVM polynomial does not 
depict a different condition than it. 

There is only a bit difference where it lies with the SVM radial i.e. near O ppm. From January the 
PLS falls rapidly and reaches 0 ppm in the month of February. Only SVM polynomial is increased at a higher 
pitch than others and reaches the 0.018 ppm in this month. With the start of March the ground level ozone 
started increasing. Only SVM radial is at a sustaining position than others and there is very less difference in 
its readings from previous 3 to 4 months. This depicts the start of season of higher concentration of ozone 
ground level. From now onwards every method and variable depicts and increasing trend with slight 
fluctuations. This trend is sustained till the mid of May. After May SVM polynomial has a fluctuating trend 
among others while SVM polynomial and SVM radial are rather stable. This situation continues till the 
month of September. 

Studies compared the SVM prediction performance in all the branches of atmospheric sciences, such 
as meterology, atmospheric physics and chemistry in addition to weather forecasting. These Studies support 
our findings, where SVM demonstrated a robust tool for prediction, examples of these studies could be found 
in [35-39]. 


4. CONCLUSIONS AND SUGGESTIONS 

The adverse effects of ozone are vulnerable and can be spread to very long distances and up to a 
wide range, depending on the direction and wind speed and can be the reason of various bad effects on the 
health of inhabitants. 

In this report three methods (MLR, PLS regression, and SVMs) were effective in predicting ground- 
level ozone concentrations ozone concentrations at Al Jahra station in Kuwait. MLR, PLS regression, and 
SVM using linear and radial kernels performed better than SVM using polynomial kernel. SVM tuning is 
computationally expensive, especially tuning radial and polynomial kernels. 

From the analysis it is found that the concentration of ground level ozone (O3) varies with season. 
This depicts the impact of predictor variables on the concentration. Usually, in the starting 4 months the 
concentration of O3 is low in air. This trend increases from the month of April and continues till the months 
of July and August. From then the concentration becomes decreasing till the January and February. In the 
study the concentration of ozone is determined by the multiple least squares regression, PLS regression, and 
SVM. 

The data for research was collected for eighteen consecutive months starting from January 2009 to 
September 2010. With the help of the statistical methods used for the forecast of ground level ozone 
concentration it is found that the concentration is variable on monthly basis. Ozone concentration has a direct 
impact due to the minor changes in the rainfall, humidity or temperature. It is observed from the study that 
different methods although give different prediction readings for ozone concentration, some of them are 
varied a lot in their results. The present research on the specified site of prediction that is Al Jahra station in 
Kuwait is identified and the work will also be helpful for the future forecast of ozone concentration. 
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